articulated robot progress

=robots =mechanical engineering

The "robot dance" was inspired by the movements of early robots. Today, Boston Dynamics has demos of robots dancing smoothly. What changed to make that possible?

When robots use geared electric motors to move joints, the motors need to be geared down enough to produce enough torque to move the robot around. When the motors have a low power-weight ratio, a high gear ratio is required. When the gear ratio is very high, movement is slow and the motors operate at their max speed for much of the time, which results in the mostly-constant-speed slow movements of a "robot dance".

When early robots used hydraulics with a centralized hydraulic system, the movement was controlled by on-off valves with flow limiters, which give a mostly constant flow rate with some variation from load. This results in constant movement speeds with rapid and jerky acceleration, especially when stopping. You've probably seen hydraulic excavators sometimes, which still use a similar system today, and when they move you can see constant-speed movements with somewhat jerky acceleration.

The power-weight ratio of electric motors has improved greatly since 1950, and the biggest reason is rare-earth permanent magnets. Power electronics for electric motors have also improved greatly.

However, even if electric motors produce a lot of power (at a high rpm) you still need gears, and if the gears are heavy, then the overall system is still heavy. The weight of gears is mostly proportional to the torque, so the last stage is the heavy part.

Today, those Boston Dynamics robots mainly use strain wave gearing, often known by the brand name of "harmonic drives". Planetary gears and cycloidal drives typically give ~50 newton-meters/kg, but strain wave gears can give >500 Nm/kg. If you look at the animation on that Wikipedia page, you can see that strain wave gears have a lot of sliding against high force. This means their efficiency is generally worse than an equivalent set of planetary gears. Because the coefficient of friction is critical, the surfaces must be very smooth, and machining to those fine tolerances is somewhat expensive. Their lifetimes are also generally shorter than planetary gears and cycloidal drives. So, they're not usually preferred over planetary gears when weight isn't critical.

Some robotic arms today use both cycloidal drives and, later in the kinematic chain where weight is more important, strain wave gears.

My understanding is that the Boston Dynamics "Spot" robot also uses planetary roller screws to drive the shoulder joints. Those are high-force linear actuators that are like leadscrews but more efficient. Here's an animation of how they work. Can you tell why they're more efficient than leadscrews?

If you paid attention to the animation, you should see that, while it has rolling movement, a planetary roller screw also must have as much sliding as a leadscrew, since threads rolling around without sliding don't produce net movement. Rather, the improved efficiency comes from a reduced coefficient of friction, which is caused by the rolling movement putting oil between the threads, producing hydrodynamic lubrication. This requires very smooth surfaces, and machining to those fine tolerances makes planetary roller screws relatively expensive.

Most mechanical devices were already invented a century ago, but CNC machining, progress in metallurgy, cheap bearings, and better tolerances can make new things possible. I think it's possible to do better than strain wave gears for articulated robots.

A lot of people look at progress in robotics in terms like "humanoid robots getting better over time" but a robotic arm using modern electric motors and strain wave gears is, in terms of technological progress, a lot closer to Boston Dynamics's Atlas robot than an early humanoid robot.

There's also been progress in controlling robots. CPUs have gotten much faster since early humanoid robots, and some relevant algorithms need fast CPUs. Boston Dynamics hasn't used neural networks for controlling joint movements, but it does use neural network image segmentation for processing input from cameras, using standard U-net architectures which are only practical because of progress in GPUs.